o 
o 

(N 



(N 



(N 
O 



Quantum and Classical Strong Direct Product Theorems 
and Optimal Time-Space Tradeoffs 

Hartmut Klauck* Robert Spalek''' Ronald de Wolf^ 

University of Calgary CWI, Amsterdam CWI, Amsterdam 

klauckh@cpsc.ucalgary.ca sr@cwi.nl rdewolf@cwi.nl 



Abstract 



A strong direct product theorem says that if we want to compute k independent instances 
, of a function, using less than k times the resources needed for one instance, then our overall 

success probability will be exponentially small in k. We establish such theorems for the classical 
as well as quantum query complexity of the OR function. This implies slightly weaker direct 
^ ' product results for all total functions. We prove a similar result for quantum communication 

CO . protocols computing k instances of the Disjointness function. 

' Our direct product theorems imply a time-space tradeoff T^S = 51 (TV^) for sorting items 

on a quantum computer, which is optimal up to polylog factors. They also give several tight 
time-space and communication-space tradeoffs for the problems of Boolean matrix- vector mul- 
, tiplication and matrix multiplication. 

o 



p ■ 1 Introduction 



P3 ■ 1.1 Direct product theorems 

^ , For every reasonable model of computation one can ask the following fundamental question: 

How do the resources that we need for computing k independent instances of / scale 
' with the resources needed for one instance and with k? 

■ Here the notion of "resource" needs to be specified. It could refer to time, space, queries, commu- 

nication etc. Similarly we need to define what we mean by "computing /", for instance whether we 
allow the algorithm some probability of error, and whether this probability of error is average-case 
or worst-case. 

In this paper we consider two kinds of resources, queries and communication, and allow our 
algorithms some error probability. An algorithm is given k inputs x^,. . . ,x^, and has to output 
the vector of k answers f{x^), . . . , fix''). The issue is how the algorithm can optimally distribute 
its resources among the k instances it needs to compute. We focus on the relation between the 
total amount T of resources available and the best-achievable success probability a (which could 
be average-case or worst-case). Intuitively, if every algorithm with t resources must have some 
constant error probability when computing one instance of /, then for computing k instances we 
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expect a constant error on each instance and hence an exponentially small success probability for 
the A:-vector as a whole. Such a statement is known as a weak direct product theorem: 

If T w i, then a = 2"^^=) 

However, even if we give our algorithm roughly kt resources, on average it still has only t resources 
available per instance. So even here we expect a constant error per instance and an exponentially 
small success probability overall. Such a statement is known as a strong direct product theorem: 

If T w kt, then a = 2-^('=) 

Strong direct product theorems, though intuitively very plausible, are generally hard to prove and 
sometimes not even true. Shaltiel [SliaOl] exhibits a general class of examples where strong direct 
product theorems fail. This applies for instance to query complexity, communication complexity, 
and circuit complexity. In his examples, success probability is taken under the uniform probability 
distribution on inputs. The function is chosen such that for most inputs, most of the k instances 
can be computed quickly and without any error probability. This leaves enough resources to solve 
the few hard instances with high success probability. Hence for his functions, with T ~ tk, one can 
achieve average success probability close to 1. 

Accordingly, we can only establish direct product theorems in special cases. Examples are Nisan 
et al.'s [NRS94] strong direct product theorem for "decision forests", Parnafes et al.'s [PRAV97] 
direct product theorem for "forests" of communication protocols, Shaltiel's strong direct product 
theorems for "fair" decision trees and his discrepancy bound for communication complexity [ShaOl]. 
In the quantum case, Aaronson [Aar04, Theorem 10] established a result for the unordered search 
problem that lies in between the weak and the strong theorems: every T-query quantum algorithm 
for searching k marked items among N = kn input bits will have success probability 0- < 0(T^/N) . 
In particular, if T ^ ^/fen, then a = 2~^^^\ 

Our main contributions in this paper are strong direct product theorems for the OR-function 
in various settings. First consider the case of classical randomized algorithms. Let OR^ denote the 
n-bit OR-function, and let f^^'^ denote k independent instances of a function /. Any randomized 
algorithm with less than, say, n/2 queries will have a constant error probability when computing 

(k) 

ORn. Hence we expect an exponentially small success probability when computing ORn using 
<^ kn queries. We prove this in Section 3: 

SDPT for classical query complexity: 

Every randomized algorithm that computes OrI'^^ using T < akn queries has worst-case 
success probability a = 2^^*^^^ (for a > a sufficiently small constant). 

For simplicity we have stated this result with a being worst-case success probability, but the 
statement is also valid for the average success probability under a hard fe-fold product distribution 
that is implicit in our proof. 

This statement for OR actually implies a somewhat weaker DPT for all total functions /, 
via the notion of block sensitivity bs{f). Using techniques of Nisan and Szegedy [NS94], we can 
embed OR{,s(j) in / (with the promise that the weight of the OR's input is or 1), while on the 
other hand we know that the classical bounded-error query complexity R2{f) is upper bounded by 
hs[ff [BBC+01]. This implies: 

Every randomized algorithm that computes f^'^^ using queries has 

worst-case success probability a = 2~^^^\ 



2 



This theorem falls short of a true strong direct product theorem in having i?2 (/) instead of R2{f) 
in the resource bound. However, the other two main aspects of a SDPT remain valid: the linear 
dependence of the resources on k and the exponential decay of the success probability. 

Next we turn our attention to quantum algorithms. Buhrman et al. [BNRW03] actually proved 
that roughly k times the resources for one instance suffices to compute /('^^ with success probability 
close to 1, rather than exponentially small: Q2{f^^^) = 0(A:Q2(/))) where Q2{f) denotes the 
quantum bounded-error query complexity of / (such a result is not known to hold in the classical 
world). For instance, (52(OR„) = Q{^/n) by Grover's search algorithm, so 0{ky/n) quantum queries 

(k) 

suffice to compute OR^ with high success probability. In Section 4 we show that if we make 
the number of queries slightly smaller, the best-achievable success probability suddenly becomes 
exponentially small: 

SDPT for quantum query complexity: 

Every quantum algorithm that computes OR^f ^ using T < ak^fn queries has worst-case 
success probability a = 2"^^'^) (for a > a sufficiently small constant). 

Our proof uses the polynomial method [BBC^Ol] and is completely different from the classical proof. 
The polynomial method was also used by Aaronson [Aar04] in his proof of a weaker quantum direct 
product theorem for the search problem, mentioned above. Our proof takes its starting point from 
his proof, analyzing the degree of a single- variate polynomial that is on {0, . . . , /c — 1}, at least 
a on A), and between and 1 on {0, . . . , kn\. The difference between his proof and ours is that we 
partially factor this polynomial, which gives us some nice extra properties over Aaronson's approach 
of differentiating the polynomial, and we use a strong result of Coppersmith and Rivlin [CR92]. In 
both cases (different) extremal properties of Chebyshev polynomials finish the proofs. 
Again, using block sensitivity we can obtain a weaker result for all total functions: 

Every quantum algorithm that computes /^^^ using T < afc(52(/)^^^ queries has worst- 
case success probability a = 2~^(^). 

The third and last setting where we establish a strong direct product theorem is quantum commu- 
nication complexity. Suppose Alice has an ?i-bit input x and Bob has an re-bit input y. These x and 
y represent sets, and DISJ„(a;,y) = 1 iff those sets are disjoint. Note that DISJ„ is the negation of 
OR„(x A y), where x f\y \s the n-bit string obtained by bitwise AND-ing x and y. In many ways, 
DISJ„ has the same central role in communication complexity as 0R„ has in query complexity. 
In particular, it is "co-NP complete" [BFS86] . The communication complexity of DIS J„ has been 
well studied: it takes 6(re) bits of communication in the classical world [KS92, Raz92] and Q{^/n) 
in the quantum world [BCW98, HW02, AA03, Raz03]. For the case where Alice and Bob want to 
compute k instances of Disjointness, we establish a strong direct product theorem in Section 5: 

SDPT for quantum communication complexity: 

Every quantum protocol that computes DISJn using T < ak^pn qubits of communi- 
cation has worst-case success probability a = 2~^^^\ 

Our proof uses Razborov's [Raz03] lower bound technique to translate the quantum protocol to a 
polynomial, at which point the polynomial results established for the quantum query SDPT take 
over. We can obtain similar results for other symmetric predicates. 

One may also consider algorithms that compute the parity of the k outcomes instead of the 
vector of k outcomes. This issue has been well studied, particularly in circuit complexity, and 
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generally goes under the name of XOR lemmas [Yao82, GNW95]. In this paper we focus mostly 
on the vector version, but we can prove similar strong bounds for the parity version. In particular, 
we state a classical strong XOR lemma in Section 3.3 and can get similar strong XOR lemmas for 
the quantum case using the technique of Cleve et al. [CDNT98, Section 3]. They show how the 
ability to compute the parity of any subset of k bits with probability 1/2 + e, suffices to compute 
the full A;- vector with probability 4e^. Hence our strong quantum direct product theorems imply 
strong quantum XOR lemmas. 

1.2 Time-Space and Communication-Space tradeoffs 

Apart from answering a fundamental question about the computational models of (quantum) query 
complexity and communication complexity, our direct product theorems also imply a number of 
new and optimal time-space tradeoffs. 

First, we consider the tradeoff between the time T and space S that a quantum circuit needs 
for sorting N numbers. Classically, it is well known that TS = 0(A^^) and that this tradeoff is 
achievable [Bca91]. In the quantum case, Klauck [Kla03] constructed a bounded-error quantum 
algorithm that runs in time T = 0{{NlogNf/^ /y/S) for all (logiV)^ < S < N/logN. He also 
showed^ a lower bound TS = 0(iV^/^), which is close to optimal for small S but not for large S. 
We use our strong direct product theorem to establish the tradeoff T'^S = Q(A^^). This is tight up 
to polylogarithmic factors. 

Secondly, we consider time-space and communication-space tradeoffs for the problems of Boolean 
matrix-vector product and Boolean matrix product. In the first problem there are an N x N matrix 
A and a vector b of dimension A^, and the goal is to compute the vector c = Ab, where q = 
\/jLi {A[i,j] A bj). In the setting of time-space tradeoffs, the matrix A is fixed and the input is the 
vector b. In the problem of matrix multiplication two matrices have to be multiplied with the same 
type of Boolean product, and both are inputs. 

Time-space tradeoffs for Boolean matrix- vector multiplication have been analyzed in an average- 
case scenario by Abrahamson [Abr90] , whose results give a worst-case lower bound of TS = n{N^/^) 
for classical algorithms. He conjectured that a worst-case lower bound of TS = 0(A^^) holds. Using 
our classical direct product result we are able to confirm this, i.e., there is a matrix A, such that 
computing Ab requires TS = r2(A^). We also show a lower bound of = Q(^N^^ for this 
problem in the quantum case. Both bounds are tight (the second within a logarithmic factor) if T 
is taken to be the number of queries to the inputs. We also get a lower bound of T'^S = r2(A^) for 
the problem of multiplying two matrices in the quantum case. This bound is close to optimal for 
small S; it is open whether it is close to optimal for large S. 

Research on communication-space tradeoffs in the classical setting has been initiated by Lam 
et al. [LTT92] in a restricted setting, and by Beame et al. [BTY94] in a general model of space- 
bounded communication complexity. In the setting of communication-space tradeoffs, players Alice 
and Bob are modeled as space-bounded circuits, and we are interested in the communication cost 
when given particular space bounds. For the problem of computing the matrix-vector product Alice 
receives the matrix A (now an input) and Bob the vector b. Beame et al. gave tight lower bounds 
e.g. for the matrix- vector product and matrix product over GF(2), but stated the complexity 
of Boolean matrix-vector multiplication as an open problem. Using our direct product result 
for quantum communication complexity we are able to show that any quantum protocol for this 

^Unfortunately there is an error in the proof presented in [KlaO.]], namely Lemma 5 appears to be wrong. 
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problem satisfies C'^S = Q.(^N^^. This is tight within a polylogarithmic factor. We also get a lower 
bound of C'^S = Q.(^N^^ for computing the product of two matrices, which again is tight. 

Note that no classical lower bounds for these problems were known previously, and that finding 
better classical lower bounds than these remains open. The possibility to show good quantum 
bounds comes from the deep relation between quantum protocols and polynomials implicit in 
Razborov's lower bound technique [Raz03]. 

2 Preliminaries 

2.1 Quantum query algorithms 

We assume familiarity with quantum computing [NCOO] and sketch the model of quantum query 
complexity, referring to [BW02] for more details, also on the close relation between query complexity 
and degrees of multivariate polynomials. Suppose we want to compute some function /. For input 
X G {0, l}'^, a query gives us access to the input bits. It corresponds to the unitary transformation 

O : \i, b, z) H-> \i,b® Xi, z). 

Here i G [A^] = {!,..., A^} and b £ {0,1}; the z-part corresponds to the workspace, which is 
not affected by the query. We assume the input can be accessed only via such queries. A T- 
query quantum algorithm has the form A = UtOUt-i ■ ■ ■ OUiOUq, where the Uk are fixed unitary 
transformations, independent of x. This A depends on x via the T applications of O. The algorithm 
starts in initial 5-qubit state |0) and its output is the result of measuring a dedicated part of the 
final state A\0). For a Boolean function /, the output of A is obtained by observing the leftmost 
qubit of the final superposition ^|0), and its acceptance probability on input x is its probability of 
outputting 1. 

One of the most interesting quantum query algorithms is Grover's search algorithm [Gro96, 

BBHT98]. It can find an index of a 1-bit in an n-bit input in expected number of o(^y^n/{\x\ + 1)^ 

queries, where |x| is the Hamming weight (number of ones) in the input. If we know that < 1, 
we can solve the search problem exactly using j\/n queries [BHMT02]. 

For investigating time-space tradeoffs we use the circuit model. A circuit accesses its input via 
an oracle like a query algorithm. Time corresponds to the number of gates in the circuit. We will, 
however, usually consider the number of queries to the input, which is obviously a lower bound on 
time. A quantum circuit uses space S if it works with S qubits only. We require that the outputs 
are made at predefined gates in the circuit, by writing their value to some extra qubits that may 
not be used later on. Similar definitions are made for classical circuits. 

2.2 Communicating quantum circuits 

In the model of quantum communication complexity, two players Alice and Bob compute a function 
/ on distributed inputs x and y. The complexity measure of interest in this setting is the amount 
of communication. The players follow some predefined protocol that consists of local unitary 
operations, and the exchange of qubits. The communication cost of a protocol is the maximal 
number of qubits exchanged for any input. In the standard model of communication complexity, 
Alice and Bob are computationally unbounded entities, but we are also interested in what happens 
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if they have bounded memory, i.e., they work with a bounded number of qubits. To this end we 
model Ahce and Bob as communicating quantum circuits, following Yao [Yao93]. 

A pair of communicating quantum circuits is actually a single quantum circuit partitioned into 
two parts. The allowed operations are local unitary operations and access to the inputs that are 
given by oracles. Alice's part of the circuit may use oracle gates to read single bits from her input, 
and Bob's part of the circuit may do so for his input. The communication C between the two parties 
is simply the number of wires carrying qubits that cross between the two parts of the circuit. A 
pair of communicating quantum circuits uses space S, if the whole circuit works on S qubits. 

In the problems we consider, the number of outputs is much larger than the memory of the 
players. Therefore we use the following output convention. The player who computes the value of 
an output sends this value to the other player at a predetermined point in the protocol. In order to 
make the models as general as possible, we furthermore allow the players to do local measurements, 
and to throw qubits away as well as pick up some fresh qubits. The space requirement only demands 
that at any given time no more than S qubits are in use in the whole circuit. 

A final comment regarding upper bounds: Buhrman et al. [BCW98] showed how to run a query 
algorithm in a distributed fashion with small overhead in the communication. In particular, if there 
is a T-query quantum algorithm computing A^-bit function /, then there is a pair of communicating 
quantum circuits with 0(T log A^) communication that computes f{x A y) with the same success 
probability. We refer to the book of Kushilevitz and Nisan [KN97] for more on communication 
complexity in general, and to the surveys [KlaOO, BuhOO, Wol02] for more on its quantum variety. 

3 Strong Direct Product Theorem for Classical Queries 

In this section we prove a strong direct product theorem for classical randomized algorithms comput- 
ing k independent instances of ORn. By Yao's principle, it is sufficient to prove it for deterministic 
algorithms under a fixed hard input distribution. 

3.1 Non-adaptive algorithms 

We first establish a strong direct product theorem for non-adaptive algorithms. We call an algorithm 
non-adaptive if, for each of the k input blocks, the maximum number of queries in that block is 
fixed before the first query. Let Suc(_^(/) be the success probability of the best algorithm for / 
under /x that queries at most t input bits. 

Lemma 1 Let f : {0, 1}" — > {0, 1} and /i be an input distribution. Every non-adaptive determin- 
istic algorithm for f^^^ under /x'^ with T < kt queries has success probability a < Suct^^(/)'^. 

Proof. The proof has two steps. First, we prove by induction that non-adaptive algorithms for /('^^ 
under general product distribution /ii x . . . x ^u^ that spend tj queries in have success probability 
— Yii=i Suct-^^. (/). Second, we argue that, when /Xj = fj,, the value is maximal for ti = t. 

Following [ShaOl, Lemma 7], we prove the first part by induction on T = ti-\- . . . -\-tk- If T = 0, 
then the algorithm has to guess k independent random variables ~ /ij. The probability of success 
is equal to the product of the individual success probabilities, i.e. 11^=1 Suco,;ij(/). 

For the induction step T =^ T + 1: pick some ti / and consider two input distributions fi'^ q 
and n'- ^ obtained from fii by fixing the queried bit x*-. By the induction hypothesis, for each value 
b £ {0, 1}, there is an optimal non-adaptive algorithm that achieves the success probability 
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Suq_i^^/^(/) • Ylj^iSuctj^^jif). We construct a new algorithm A that calls as a subroutine 
after it has queried a;*- with b as an outcome. A is optimal and it has success probability 

\b=0 / jj^i i=l 

For symmetry reasons, if all k instances are independent and identically distributed, then 
the optimal distribution of queries ti + ... + = kt is uniform, i.e. ti = t. In such a case, the 
algorithm achieves the success probability Sucf_^(/)'^. □ 



3.2 Adaptive algorithms 

In this section we prove a similar statement also for adaptive algorithms. 

Remark. The strong direct product theorem is not always true for adaptive algorithms. Follow- 
ing [ShaOl], define h{x) = xi V (x2 © • . . © Clearly Suc2^^(/i) = 3/4 for /i uniform. By a 

Chernoff bound, Suc2 ^k{h^^^) = 1 — 2~^(^), because approximately half of the blocks can be 
solved using just 1 query and the unused queries can be used to answer exactly also the other half 
of the blocks. 

However, the strong direct product theorem is valid for OKn^ under u'', where z^(0") = 1/2 
and I'^Ci) = l/2n for Cj an n-bit string that contains a 1 only at the i-th position. It is simple to 
prove that Suca.n,i/{0^n) = Non-adaptive algorithms for OUn^ under u'' with akn queries 

thus have a < (^^)'^ = 2~^°^''q+i^'^. We can achieve any 7 < 1 by choosing a sufficiently small. 
We prove that adaptive algorithms cannot be much better. Without loss of generality, we assume: 

1. The adaptive algorithm is deterministic. By Yao's principle [Yao77], if there exists a random- 
ized algorithm with success probability a under some input distribution, then there exists a 
deterministic algorithm with success probability a under that distribution. 

2. Whenever the algorithm finds a 1 in some input block, it stops querying that block. 

3. The algorithm spends the same number of queries in all blocks where it does not find a 1. 
This is optimal due to the symmetry between the blocks, and implies that the algorithm 
spends at least as many queries in each "empty" input block as in each "non-empty" block. 

Lemma 2 // there is an adaptive T-query algorithm A computing OUn^ under with success 

(k) 

probability a, then there is a non-adaptive 3T-query algorithm A computing OIvi with success 
probability a — 2~^^^\ 

Proof. Let Z be the number of empty blocks. E[Z] = k/2 and, by a Chernoff bound, 5 = 
Vt[Z < k/3] = 2~^^^\ li Z > k/3, then A spends at most 3T/k queries in each empty block. 
Define non-adaptive A' that spends 3T/k queries in each block. Then A' queries all the positions 
that A queries, and maybe some more. Compare the overall success probabilities of A and A': 

a A = Pr [Z < k/3] ■ Pr [A succeeds \ Z <k/3] 
+ Pt[Z> k/3] ■ Pr [A succeeds \ Z>k/3] 

< 6 - l + FiiZ > k/3] ■ Pr [A' succeeds | Z > k/3] 

< 6 + aA'- 
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We conclude that (7^/ > cja — 5- {Remark. By replacing the /c/S-bound on Z by a /3/c-bound for 
some /? > 0, we can obtain arbitrary 7 < 1 in the exponent 5 = 2"'^'^, while the number of queries 
of A' becomes T//?.) □ 

Combining the two lemmas establishes the following theorem: 

Theorem 3 (SDPT for OR) For every < 7 < 1, there exists an a > Q such that every ran- 
domized algorithm for OrI^^ with T < akn queries has success probability a < 2""^^ . 

3.3 A bound for the parity instead of the vector of results 

Here we give a strong direct product theorem for the parity of k independent instances of 0R„. 
The parity is a Boolean variable, hence we can always guess it with probability at least ^. However, 
we prove that the advantage (instead of the success probability) of our guess must be exponentially 
small. 

Let X be a random bit with Pr [X = 1] = p. We define the advantage of X by Adv{X) = 
\2p — 1|. Note that a uniformly distributed random bit has advantage and a bit known with 
certainty has advantage 1. It is well known that if Xi, . . . are independent random bits, then 
Ady{Xi © . . . © Xfc) = Y[i=i Adv(Xj). Compare this with the fact that the probability of guessing 
correctly the complete vector {Xi, . . . ,Xfc) is the product of the individual probabilities. 

We have proved a lower bound for the computation of OrI'^^ (vector of OR's). By the same 
technique, replacing the success probability by the advantage in all claims and proofs, we can also 
prove a lower bound for the computation of OR®^ (parity of OR's). 

Theorem 4 (SDPT for parity of OR's) For every < 7 < 1, there exists an a > such that 
every randomized algorithm for OR®^ with T < akn queries has advantage r < 2~'^'^ . 

3.4 A bound for all functions 

Here we show that the strong direct product theorem for OR actually implies a weaker direct 
product theorem for all functions. In this weaker version, the success probability of computing k 
instances still goes down exponentially with fc, but we need to start from a polynomially smaller 
bound on the overall number of queries. 

Definition 1 For x G {0, 1}" and SC. [n], we use x^ to denote the n-bit string obtained from x 
by flipping the bits in S. Consider a (possibly partial) function f : T> ^ Z , with T> C {0, 1}". The 
block sensitivity bsx{f) of x G T> is the maximal b for which there are disjoint sets Si, ... ,3^ such 
that f{x) / f{x^^). The block sensitivity of f is maxx^v bsx{f)- 

Block sensitivity is closely related to deterministic and bounded-error classical query complexity: 

Theorem 5 ([Nis91, BBC+01]) R2{f) = ^{bs{f)) for all f , D{f) < bs{ff for all total Boolean f. 

Nisan and Szegedy [NS94] showed how to embed a 6s(/)-bit OR- function (with the promise 
that the input has weight < 1) into /. Combined with our strong direct product theorem for OR, 
this implies a direct product theorem for all functions in terms of their block sensitivity: 
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Theorem 6 For every < 7 < 1, there exists an a > such that for every f, every classical 
algorithm for f^^^ with T < akbs{f) queries has success probability a < 2~'^^ . 

This is optimal whenever R2{f) = Q{bs{f)), which is the case for most functions. For total 
functions, the gap between R2{f) and bs{f) is not more than cubic, hence 

Corollary 7 For every < 7 < 1, there exists an a > such that for every total Boolean f , every 
classical algorithm for f^^^ with T < akR2{fY^^ queries has success probability a < 2~'^^ . 

4 Strong Direct Product Theorem for Quantum Queries 

In this section we prove a strong direct product theorem for quantum algorithms computing k 
independent instances of OR. Our proof relies on the polynomial method of [BBC"''01]. 

4.1 Bounds on polynomials 

We use three results about polynomials, also used in [BCWZ99]. The first is by Coppersmith and 
Rivlin [CR92, p. 980] and gives a general bound for polynomials bounded by 1 at integer points: 

Theorem 8 (Coppersmith & Rivlin [CR92]) Every polynomial p of degree d < n that has 
absolute value 

\p{i)\ < 1 for all integers i G [0, n], 

satisfies 

\p{x)\ < ae^"'^/" for all real x £ [0, n], 
where a,b> are universal constants (no explicit values for a and b are given in [CR92j). 

The other two results concern the Chebyshev polynomials T^, defined as in [Riv90]: 

Tdix) = i (^(^x + \/x2-l)% (^x - Vx^ - . 

Td has degree d and its absolute value |Td(x)| is bounded by 1 if x G [—1, 1]. On the interval [1, 00), 
Td exceeds all others polynomials with those two properties ([Riv90, p. 108] and [Pat92, Fact 2]): 

Theorem 9 If q is a polynomial of degree d such that \q{x)\ < 1 for all x G [—1,1] then \q{x)\ < 
\Td{x)\ for all x > 1. 

Paturi [Pat92, before Fact 2] proved 
Lemma 10 (Paturi [Pat92]) Td{l + ^) < e^'^V^^^ for all ^ > 0. 

Pro of. Fo r x = 1 + fi: Td{x) < (x + x"^ - l)'^ = (1 + ^ + v^2^ + ^2)<i < (l + 2^/2^ + ^l'^f < 
^2d./2iH^ (using that 1 + z < for all real z). □ 
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The following key lemma is the basis for all our direct product theorems: 

Lemma 11 Suppose p is a degree-D polynomial such that for some 5 > 

-5 < p{i) <6 for alliG {0,...,k- 1}, 
p{k) = a, 

p{i) G [-(5, 1 + 5] for a// i G {0, . . . , iV}. 
Then for every integer 1 < C < N — k and fi = 2C/{N — k — C) we have 

^ - " + ^ + ■ """^ { {N^-~k%) + " ^) V^^;^ - ^HC/k)^ + 5k2^-\ 

where a, h are the constants given by Theorem 8. 

Before establishing this gruesome bound, let us reassure the reader by noting that we will apply 
this lemma with 6 negligibly small, D = a^/kN for sufficiently small a, and C = ke'^^^, giving 



a < exp ((6a2 + 4ae^/2+i/2 _ ^ _ < ^-ik < 2" 



Proof of Lemma 11. Divide p with remainder by Jlj=o(^ ~ j) obtain 

k-l 

p{x) = q{x) — j) + r(x), 

3=0 

where d = (ieg{q) = D — k and deg(r) < k — 1. We know that r{x) = p{x) G [—(5,(5] for all 
X G {0, . . . , A; — 1}. Decompose r as a linear combination of polynomials e^, where ei{i) = 1 and 
ej(x) = for x G {0, . . . , A; - 1} - {i}: 

k-i fe-i k-l 

'^(^) = ^pi^^ii^) = Yjp^^^ n J~- 

i=0 i=0 3=0 ^ 

We bound the values of r for all real x G [0, N] by 

- (fc-l)!^V W - (fc-1)!' 
|r(A;)| < 5k2^~^. 

This implies the following about the values of the polynomial q: 

\q{k)\ > {a -5k2^~^)/k\ 

In particular: 
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\q{^)\ < C'~'(l + <5+-|^) =yl foiie{k + C,...,N} 

Theorem 8 implies that there are constants a,b > such that 

\q{x)\ < A ■ ae^'^'/(^-'="^) = B for all real xe[k + C,N]. 

We now divide g by to normalize it, and rescale the interval [k + C, N] to [1, —1] to get a degree-d 
polynomial t satisfying 

\t{x)\ < 1 for ah X G [-1,1] 
t{l + fi) = q{k)/B for = 2C/(iV - fc - C) 

Since t cannot grow faster than the degree-d Chebyshev polynomial, we get 

t{l + ^l) < rd(l + /i) < e^-^^/^^^ 

Combining our upper and lower bounds on t{l + /i), we obtain 

{a-5k2'^-^)/k\ ^ g2dVW 

C-k {l + 5 + f^) ae'"^V(iV-/c-c) " 

Rearranging gives the bound. □ 



4.2 Consequences for quantum algorithms 

The previous result about polynomials implies a strong tradeoff between queries and success prob- 
ability for quantum algorithms that have to find k ones in an A'^-bit input. A k-threshold algorithm 
with success probability a is an algorithm on iV-bit input x, that outputs with certainty if |x| < k, 
and outputs 1 with probability at least cr if | a; | = k. 

Theorem 12 For every 7 > 0, there exists ana > such that every quantum k-threshold algorithm 
with T < a\J kN queries has success probability a < 2~'^^ . 

Proof. Fix 7 > and consider a T-query A;-threshold algorithm. By [BBC"*'01], its acceptance 
probability is an iV-variate polynomial of degree D < 2T < 2a\/ kN and can be symmetrized to a 
single-variate polynomial p with the properties 

p{i) =OifiG {0, ...,A;-1} 
p{k) > (J 

p{i) G [0, 1] for alH G {0, . . . , N] 

Choosing a > sufficiently small and (5 = 0, the result follows from Lemma 11. □ 

This implies a strong direct product theorem for k instances of the n-bit search problem: 

Theorem 13 (SQDPT for Search) For every 7 > 0, there exists an a > Q such that every 
quantum algorithm for Searchn'^^ with T < ak\fn queries has success probability a < 2~'^^ . 
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(k) 

Proof. Set N = kn, fix a 7 > and a T-query algorithm A for Search„ with success probabihty 
a. Now consider the following algorithm that acts on an A^-bit input x: 

1. Apply a random permutation vr to x. 

2. Run A on tt{x). 

3. Query each of the k positions that A outputs, and return 1 iff at least k/2 of those bits are 1. 

This uses T + k queries. We will show that it is a A;/2-threshold algorithm. First, if |x| < /c/2, it 
always outputs 0. Second, consider the case |x| = k/2. The probability that vr puts all k/2 ones in 
distinct n-bit blocks is 



N N-n 



N N-l 




2"fc/2 



Hence our algorithm outputs 1 with probability at least a2 Choosing a sufficiently small, the 
previous theorem implies (t2~^/^ < 2"^'''"'"^/^)^, hence a < 2^'^'^. □ 

Our bounds are quite precise for a <^1. We can choose 7 = 21n(l/a) — 0(1) and ignore some 
lower-order terms to get roughly a < o?^ . On the other hand, it is known that Grover's search 
algorithm with a^fn queries on an n-bit input has success probability roughly c? [BBHT98] . Doing 
such a search on all k instances gives overall success probability o?^ . 

Theorem 14 (SQDPT for OR) There exist a, 7 > such that every quantum algorithm for 
ORn^'' with T < ak^/n queries has success probability a < 2"'^^. 

(k) 

Proof. An algorithm A for ORn with success probability a can be used to build an algorithm A 

(k) 

for Searchn with slightly worse success probability: 

1. Run A on the original input and remember which blocks contain a 1. 

2. Run simultaneously (at most k) binary searches on the nonzero blocks. Iterate this s = 
21og(l/a) times. Each iteration is computed by running A on the parts of the blocks that 
are known to contain a 1, halving the remaining instance size each time. 

3. Run the exact version of Grover's algorithm on each of the remaining parts of the instances 
to look for a one there (each remaining part has size n/2'^). 



This new algorithm A' uses {s + l)T + ^ky'n/2^ = 0{a\og{l / a)k^/n) queries. With probability at 

least (T*"*"^, A succeeds in all iterations, in which case A' solves Search^f''. By the previous theorem, 
for every 7' > of our choice we can choose a > such that 

which implies the theorem with 7 = /{s + 1). □ 

Choosing our parameters carefully, we can actually show that for every 7 < 1 there is an a > 
such that aky/n queries give success probability a < 2~'^'^ . Clearly, a = 2~^ is achievable without 
any queries by random guessing. 
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4.3 A bound for all functions 



As in Section 3.4, we can extend the strong direct product theorem for OR to a shghtly weaker 
theorem for all total functions. Block sensitivity is closely related to bounded-error quantum query 
complexity: 

Theorem 15 ([BBC+01]) Q2{f) = n(^^/bs(jfj for all f, D{f) < bs{ff for all total Boolean f. 
By embedding an OR of size hs{f) in /, we obtain 

Theorem 16 There exist a, 7 > such that for every f, every quantum algorithm for f^^^ with 
T < ak\J bs{f) queries has success probability a < 2~"'^ . 

This is close to optimal whenever Q2{f) = 0^^6s(/)^ . For total functions, the gap between 
Q2{f) and bs(f) is no more than a 6th power, hence 

Corollary 17 There exist a,7 > such that for every total Boolean f, every quantum algorithm 
for f^^^ with T < OikQ2{fY^^ queries has success probability a < 2~'^^ . 

5 Strong Direct Product Theorem for Quantum Communication 

In this section we establish a strong direct product theorem for quantum communication complexity, 
specifically for protocols that compute k independent instances of the Disjointness problem. Our 
proof relies crucially on the beautiful technique that Razborov introduced to establish a lower bound 
on the quantum communication complexity of (one instance of) Disjointness [Raz03]. It allows us 
to translate a quantum communication protocol to a single-variate polynomial that represents, 
roughly speaking, the protocol's acceptance probability as a function of the size of the intersection 
of X and y. Once we have this polynomial, the results from Section 4.1 suffice to establish a strong 
direct product theorem. 

5.1 Razborov's technique 

Razborov's technique relies on the following linear algebraic notions. The operator norm \\ A \\ of 
a matrix A is its largest singular value ci. The trace inner product between A and B is {A, B) = 
Tt:{A*B). The trace norm is || A \\^^ = max{|(A,i?)| : || B || = 1} = Yli'^i- Frobenius norm is 

II A \\p = \jYlij I ^ij I ^ — ^i- 'T^^ following lemma is implicit in Razborov's paper. 

Lemma 18 Consider a Q-qubit quantum communication protocol on N-bit inputs x and y, with 
acceptance probabilities denoted by P{x,y). Define P{i) = '^\x\=\y\=N/i,\x^y\=i\[P{^1y)]; where the 
expectation is taken uniformly over all x,y that each have weight iV/4 and that have intersection i. 
For every d < N/A there exists a degree-d polynomial q such that \P{i) — q{i)\ < 2"'^/^+^'^ for all 
ie{0,...,N/8}. 

Proof. We only consider the J\f = {j^^^ strings of weight N/A. Let P denote the J\f x J\f matrix 
of the acceptance probabilities on these inputs. We know from Yao and Kremer [Yao93, Kre95] 
that we can decompose P as a matrix product P = AB, where A is an Af x 2^*3"^ matrix with 
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each entry at most 1 in absolute value, and similarly for B. Note that || A \\p, \\ B \\p < VJ72^^~^. 
Using Holder's inequality we have: 

II P Wtr < II A \\p- II B 11^ < 

Let Hi denote the J\f x J\f matrix corresponding to the uniform probability distribution on {(x, y) : 
\xAy\ = i}. These "combinatorial matrices" have been well studied [Knu03]. Note that {P, fJ^i) is the 
expected acceptance probability P{i) of the protocol under that distribution. One can show that the 
different /ij commute, so they have the same eigenspaces Eq, . . . , i?7v/4 can be simultaneously 
diagonalized by some orthogonal matrix U. For t E {0, . . . ,N/A}, let {UPU^)t denote the block of 
UPU^ corresponding to Et, and at = Tt{{UPU^)t) be its trace. Then we have 

N/4 M 

Y: \at\ < E liUPU^U < II UPU^ Wtr = \\P\\tr< A^2^^"^ 

t=0 j=l 

where the second inequality is a property of the trace norm. 

Let Xit be the eigenvalue of /Xj in eigenspace Ef. It is known [Raz03, Section 5.3] that Xu is a 
degree-t polynomial in i, and that |Aj(| < jM for i < N/8 (the factor 1/4 in the exponent is 
implicit in Razborov's paper). Consider the high-degree polynomial p defined by 

N/A 

p{i) = y^atXit- 
t=o 

This satisfies 

N/A 

pii) = Y,TT{{UPU^)t)Xa = {UPU^,U^i,U^) = {P,fii) = P{i). 
t=o 

Let q be the degree-d polynomial obtained by removing the high-degree parts of p: 

d 

q{i) = at Xit- 
t=o 

Then P and q are close on all integers i between and 
\P{i)-qm = \p{i)-qm-- 



N/4 

atXi 

t=d+l 



□ 



5.2 Consequences for quantum protocols 

Combining Razborov's technique with our polynomial bounds we can prove 

Theorem 19 (SQDPT for Disjointness) There exist a, 7 > such that every quantum protocol 
for DlSjJf^ with Q < ak^fn qubits of communication has success probability p < 2~^'^ . 
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Proof (sketch). By doing the same trick with s = 21og(l/a) rounds of binary search as for 
Theorem 14, we can tweak a protocol for DlSJ^f ^ to a protocol that satisfies, with P{i) defined as 
in Lemma 18, N = kn and a = p*"*"^: 

P(i) = if i G {0, . . . , /c - 1} 
P{k) > a 

P{i) e [0, 1] for alH G {0, . . . , N} 

(a subtlety: instead of exact Grover we use an exact version of the 0(-v/n)-qubit Disjointness 
protocol of [AA03]; the [BCW98]-protocol would lose a log n-factor) . Lemma 18, using d = 12Q, 
then gives a degree-d polynomial q that differs from P by at most 5 < on all i G {0, . . . , A^/8}. 
This 5 is sufficiently small to apply Lemma 11, which in turn upper bounds a and hence p. □ 

This technique also gives strong direct product theorems for symmetric predicates other than DISJ„. 

6 Time-Space Tradeoff for Quantum Sorting 

We will now use our strong direct product theorem to get near-optimal time-space tradeoffs for 
quantum circuits for sorting. This follows Klauck [Kla03], who described an upper bound T^S = 
0{{^N\ogN)^^ and a lower bound TS = Q.iyN^/'^^. In our model, the numbers ai, . . . , aN that we 
want to sort can be accessed by means of queries, and the number of queries lower bounds the actual 
time taken by the circuit. The circuit has output gates and in the course of its computation 
outputs the N numbers in sorted (say, descending) order, with success probability at least 2/3. 

Theorem 20 Every hounded- error quantum circuit for sorting N numbers that uses T queries and 
S qubits of workspace satisfies T'^S = Q(^N^) . 

Proof. We "slice" the circuit along the time-axis into L = T/aV SN slices, each containing 
T/L = aV SN queries. Each such slice has a number of output gates. Consider any slice. Suppose 
it contains output gates i,i + 1, . . . ,i + k — 1, for i < N/2, so it is supposed to output the i-th up 
to i + k — 1-th largest elements of its input. We want to show that k = 0{S). If k < S then we are 
done, so assume k > S. We can use the slice as a A;-threshold algorithm on N/2 bits, as follows. 
For an iV/2-bit input x, construct a sorting input by taking i — 1 copies of the number 2, the A^/2 
bits in X, and A'^/2 — i + 1 copies of the number 0, and append their position behind the numbers. 

Consider the behavior of the sorting circuit on this input. The first part of the circuit has 
to output the i — 1 largest numbers, which all start with 2. We condition on the event that the 
circuit succeeds in this. It then passes on an S'-qubit state (possibly mixed) as the starting state 
of the particular slice we are considering. This slice then outputs the k largest numbers in x with 
probability at least 2/3. Now, consider an algorithm that runs just this slice, starting with the 
completely mixed state on S'-qubits, and that outputs 1 if it finds k numbers starting with 1, 
and outputs otherwise. If |x| < k this new algorithm always outputs (note that it can verify 
finding a 1 since its position is appended), but if |x| = k then it outputs 1 with probability at least 
(7 > I • 2~^ , because the completely mixed state has "overlap" 2~^ with the "good" S'-qubit state 
that would have been the starting state of the slice in the run of the sorting circuit. On the other 
hand, the slice has only aV SN < aVkN queries, so by choosing a sufficiently small. Theorem 12 
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implies a < 2 ^(^). Combining our upper and lower bounds on a gives k = 0{S). Thus we need 




As mentioned, our tradeoff is achievable up to polylog factors [Kla03]. Interestingly, the near- 
optimal algorithm uses only a polylogarithmic number of qubits and otherwise just classical memory. 
For simplicity we have shown the lower bound for the case when the outputs have to be made in 
their natural ordering only, but we can show the same lower bound for any ordering of the outputs 
that does not depend on the input using a slightly different proof. 

7 Time-Space Tradeoffs for Boolean Matrix Products 

First we show a lower bound on the time-space tradeoff for Boolean matrix-vector multiplication 
on classical machines. 

Theorem 21 There is a matrix A such that every classical bounded- error circuit that computes the 
Boolean matrix-vector product Ab with T queries and space S = o{N / log N) satisfies TS = r2(A^^). 

The bound is tight if T measures queries to the input. 

Proof. Fix k = 0{S) large enough. First we have to find a hard matrix A. We pick A randomly by 
setting N/{2k) random positions in each row to 1. We want to show that with positive probability 
for all sets of k rows . . . many of the rows A[ij] contain at least N/{6k) ones that are 

not ones in any of the k — 1 other rows. 

This probability can be bounded as follows. We will treat the rows as subsets of {1, . . . , N}. A 
row A\j] is called bad with respect to k — 1 other rows A[ii], . . . ,A[ik-i], if — U^A[i£]| < N/{6k). 
For fixed ii, ■ ■ ■ ,«fc-i, the probability that some A[j] is bad with respect to the k — 1 other rows 
is at most by the Chernoff bound and the fact that k rows can together contain at most 

N/2 elements. Since k = o{N/\ogN) we may assume this probability is at most \/N^^. 

Now fix any set / = {zi, . . . , i^}. The probability that for j E / it holds that A[j] is bad with 
respect to the other rows is at most 1/A'^^'^, and this also holds, if we condition on the event that 
some other rows are bad, since this condition makes it only less probable that another row is also 
bad. So for any fixed J C / of size k/2 the probability that all rows in J are bad is at most N~^^ , 
and the probability that there exists such J is at most 



Furthermore the probability that there is a set I oi k rows for which k/2 are bad is at most 



So there is an A as required and we may fix one. 

Now suppose we are given a circuit with space S that computes the Boolean product between 
the rows of A and h in some order. We again proceed by "slicing" the circuit into L = T/aN 
slices, each containing T/L = aN queries. Each such slice has a number of output gates. Consider 
any slice. Suppose it contains output gates ii < ■ ■ ■ < ik < N'/2, so it is supposed to output 
V^^ {A[ij,i] A hi) for all ij with I < j < k. 
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Such a slice starts on a classical value of the "memory" of the circuit, which is in general 
a probability distribution on S bits (if the circuit is randomized). We replace this probability 
distribution by the uniform distribution on the possible values of S bits. If the original circuit 
succeeds in computing the function correctly with probability at least 1/2, then so does the circuit 
slice with its outputs, and replacing the initial value of the memory by a uniformly random one 
decreases the success probability to no less than (1/2) • 1/2*^. 

If we now show that any classical circuit with aN queries that produces the outputs ii, . . . , 
can succeed only with exponentially small probability in k, we get that k = 0(5), and hence 
(T/aN) ■ 0{S) > N, which gives the claimed lower bound for the time-space tradeoff. 

Each set of k outputs corresponds to k rows of A, which contain N/{2k) ones each. Thanks 
to the construction of A there are k/2 rows among these, such that N/{6k) of the ones in each 
such row are in position where none of the other contains a one. So we get k/2 sets of N/{6k) 
positions that are unique to each of the k/2 rows. The inputs for b will be restricted to contain ones 
only at these positions, and so the algorithm naturally has to solve k/2 independent OR problems 
on n = N/{6k) bits each. By Theorem 3, this is only possible with aN queries if the success 
probability is exponentially small in A;. □ 

An absolutely analogous construction can be done in the quantum case. Using circuit slices of 
length aVNS we can prove: 

Theorem 22 There is a matrix A such that every quantum bounded- error circuit that computes the 
Boolean matrix-vector product Ab with T queries and space S = o{N/ log N) satisfies T'^S = Ql^N"^^ . 

Note that this is tight within a logarithmic factor (needed to improve the success probability 
of Grover search). 

Theorem 23 Every classical bounded- error circuit that computes the Boolean matrix product AB 
with T queries and space S satisfies TS = 17 (A^'^). 

While this is near-optimal for small S, it is probably not tight for large S, a likely tight 
tradeoff being T'^S = r2(A^^). It is also no improvement compared to Abrahamson's average-case 
bounds [Abr90]. 

Proof. Suppose that S = o{N), otherwise the bound is trivial, since time A^^ is always needed. 
We can proceed similar to the proof of Theorem 21. We slice the circuit so that each slice has only 
aN queries. Suppose a slice makes k outputs. We are going to restrict the inputs to get a direct 
product problem with k instances of size N/k each, hence a slice with aN queries has exponentially 
small success probability in k and can thus produce only 0(5") outputs. Since the overall number 
of outputs is N^ we get the tradeoff TS = n{N^) . 

Suppose a circuit slice makes k outputs, where an output labeled {i,j) needs to produce the 
vector product of the ith. row A[i] of A and the jth column B[j] of B. We may partition the set 
{1, . . . , A^} into k mutually disjoint subsets U{i,j) of size N/k, each associated to an output {i,j)- 

Assume that there are i outputs (i, j'l), . . . , (i, J^) involving A[i]. Each such output is associated 
to a subset U{i,jt), and we set A[i] to zero on all positions that are not in any of these subsets, 
and to one on all positions that are in one of these. When there are i outputs (ii, j), . . . , {ii,j) 
involving B[j], we set B[j] to zero on all positions that are not in any of the corresponding subsets, 
and allow the inputs to be arbitrary on the other positions. 



17 



If the circuit computes on these restricted inputs, it actually has to compute k instances of OR 
of size n = N/k in B, for it is true that A[i] and B[j] contain a single block of size N/k in which 
A[i] contains only ones, and B[j] "free" input bits, if and only if is one of the k outputs. 
Hence the strong direct product theorem is applicable. □ 

The application to the quantum case is analogous. 

Theorem 24 Every quantum bounded- error circuit that computes the Boolean matrix product AB 
with T queries and space S satisfies T'^S = 17 (A^^). 

If S" = 0(log A^), then N"^ applications of Grover can compute AB with T = 0{N'^-^ log N) . Hence 
our tradeoff is near-optimal for small S. We do not know whether it is optimal for large S. 

8 Quantum Communication- Space Tradeoffs for Matrix Products 

In this section we use the strong direct product result for quantum communication (Theorem 19) 
to prove communication-space tradeoffs. We later show that these are close to optimal. 

Theorem 25 Every quantum bounded-error protocol in which Alice and Bob have bounded space 
S and that computes the Boolean matrix-vector product, satisfies C'^S = Q(^N^^. 

Proof. In a protocol, Alice receives a matrix A, and Bob a vector b as inputs. Given a circuit 
that multiplies these with communication C and space S, we again proceed to slice it. This 
time, however, a slice contains a limited amount of communication. Recall that in communicating 
quantum circuits the communication corresponds to wires carrying qubits that cross between Alice's 
and Bob's circuits. Hence we may cut the circuit after aV NS qubits have been communicated 
and so on. Overall there are C/aV NS circuit slices. Each starts with an initial state that may 
be replaced by the completely mixed state at the cost of decreasing the success probability to 
(l/2)-l/2'^. We want to employ the direct product theorem for quantum communication complexity 
to show that a protocol with the given communication has success probability at most exponentially 
small in the number of outputs it produces, and so a slice can produce at most 0(5) outputs. 
Combining these bounds with the fact that outputs have to be produced gives the tradeoff. 

To use the direct product theorem we restrict the inputs in the following way: Suppose a 
protocol makes k outputs. We partition the vector b into k blocks of size N/k, and each block 
is assigned to one of the k rows of A for which an output is made. This row is made to contain 
zeroes outside of the positions belonging to its block, and hence we arrive at a problem where 
Disjointness has to be computed on k instances of size N/k. With communication aVkN, the 
success probability must be exponentially small in k due to Theorem 19. Hence k = 0{S) is an 
upper bound on the number of outputs produced. □ 

Theorem 26 Every quantum bounded-error protocol in which Alice and Bob have bounded space 
S and that computes the Boolean matrix product, satisfies C'^S = Q(A^^). 

Proof. The proof uses the same slicing approach as in the other tradeoff results. Note that we can 
assume that S = o{N), since otherwise the bound is trivial. Each slice contains communication 
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ayiVS, and as before a direct product result showing that k outputs can be computed only with 
success probability exponentially small in k leads to the conclusion that a slice can only compute 
0{S) outputs. Therefore (C la\J NS) ■ 0{S) > N^, and we are done. 

Consider a protocol with aVNS qubits of communication. We partition the universe {1, . . . , N} 
of the Disjointness problems to be computed into k mutually disjoint subsets U{i,j) of size N/k, 
each associated to an output which in turn corresponds to a row/column pair A[i], B[j] in 

the input matrices A and B. Assume that there are i outputs (i, ji), ■ ■ ■ , {i, je) involving A[i]. Each 
output is associated to a subset of the universe U{i,jt), and we set A[i] to zero on all positions that 
are not in one of these subsets. Then we proceed analogously with the columns of B. 

If the protocol computes on these restricted inputs, it has to solve k instances of Disjointness 
of size n = N/k each, since A[i] and B[j] contain a single block of size N/k in which both are not 
set to if and only if is one of the k outputs. Hence Theorem 19 is applicable. □ 

We now want to show that these tradeoffs are not too far from optimal. 

Theorem 27 There is a quantum bounded- error protocol with space S that computes the Boolean 
product between a matrix and a vector within communication C = 0{{N'^/'^ N) / \fS) . 

There is a quantum bounded- error protocol with space S that computes the Boolean product 
between two matrices within communication C = 0{{N^/'^ log^ N) / y/S) . 

Proof. We begin by showing a protocol for the following scenario: Alice gets S A^-bit vectors 
xi, . . . , xs, Bob gets an A^-vector vector y, and they want to compute the S Boolean inner products 
between these vectors. The protocol uses space 0{S). 

In the following, we interpret Boolean vectors as sets. The main idea is that Alice can use the 
union z of the Xi and then Alice and Bob can find an element in the intersection of z and y using the 
protocol for the Disjointness problem described in [BCW98]. Alice then marks all Xi that contain 
this element and removes them from z. 

A problem with this approach is that Alice cannot store z explicitly, since it might contain 
much more than S elements. Alice may, however, store the indices of those sets Xi for which an 
element in the intersection of Xi and y has already been found, in an array of length S. This array 
and the input given as an oracle work as an implicit representation of z. 

Now suppose at some point during the protocol the intersection of z and y has size k. Then 
Alice and Bob can find one element in this intersection within 0(y^ N/k) rounds of communication 
in which O(logA^) qubits are exchanged each. Furthermore in 0{VNk) rounds all elements in 
the intersection can be found. So if /c < S, then all elements are found within communication 
0{V NS log A^) and the problem can be solved completely. On the other hand, if k > S, finding 
one element costs O {^/N/S log N), but finding such an element removes at least one Xi from z, 
and hence this has to be done at most S times, giving the same overall communication bound. 

It is not hard to see that this process can be implemented with space 0(5"). The protocol from 
[BCW98] is a distributed Grover search that itself uses only space O(logA^). Bob can work as in 
this protocol. For each search, Alice has to start with a superposition over all indices in z. This 
superposition can be computed from her oracle and her array. In each step she has to apply the 
Grover iteration. This can also be implemented from these two resources. 

To get a protocol for matrix- vector product, the above procedure is repeated N/S times, hence 
the communication is 0{{N/S) ■ V NS log^ A^), where one logarithmic factor stems from improving 
success probability to l/poly(A^). 
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For the product of two matrices, the matrix-vector protocol may be repeated times. □ 
These near-optimal protocols use only 0(log A^) "real" qubits, and otherwise just classical memory. 



9 Open Problems 

We mention some open problems. The first is to determine tight time-space tradeoffs for Boolean 
matrix product on both classical and quantum computers. Second, regarding communication-space 
tradeoffs for Boolean matrix- vector and matrix product, we did not prove any classical bounds that 
were better than our quantum bounds. Klauck [Kla04] recently proved classical tradeoffs CS"^ = 
ri(A^^) and CS'^ = ri(A^^) for Boolean matrix product and matrix- vector product, respectively, by 
means of a weak direct product theorem for Disjointness. A classical strong direct product theorem 
for Disjointness would imply optimal tradeoffs, but we do not know how to prove this at the moment. 
Finally, it would be interesting to get any lower bounds on time-space or communication-space 
tradeoffs for decision problems in the quantum case, for example for Element Distinctness [BDH+01, 
Amb03] or the verification of matrix multiplication [BS04]. 
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